The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
动作识别是人工智能的激动人心的研究途径,因为它可能是新兴工业领域(例如机器人视觉和汽车)的游戏规则。但是,由于巨大的计算成本和效率低下的学习,当前的深度学习面临着此类应用的主要挑战。因此,我们开发了一种新型的基于脑启发的尖峰神经网络(SNN)的系统,标题为用于在线动作学习的尖峰门控流(SGF)。开发的系统由多个以分层方式组装的SGF单元组成。单个SGF单元涉及三层:特征提取层,事件驱动的层和基于直方图的训练层。为了展示开发的系统功能,我们采用标准的动态视觉传感器(DVS)手势分类作为基准。结果表明,我们可以达到87.5%的精度,这与深度学习(DL)相当,但在较小的培训/推理数据编号比率为1.5:1。在学习过程中,只需要一个单个培训时代。同时,据我们所知,这是基于非回复算法的SNN中最高准确性。最后,我们结论了开发网络的几乎没有的学习范式:1)基于层次结构的网络设计涉及人类的先验知识; 2)用于基于内容的全局动态特征检测的SNN。
translated by 谷歌翻译
已经证明,深度神经网络(DNN)在解决许多现实问题方面是有效的,但其高计算成本禁止将这些模型部署到边缘设备。修剪,作为将零的方法引入模型重量的方法,已显示是在模型精度和计算效率之间提供良好权衡的有效方法,并且是一种生成压缩模型的广泛使用的方法。然而,修剪的粒度使得重要的权衡。在相同的稀疏性水平上,粗粒结构的稀疏图案在传统硬件上更有效,但导致更差的精度,而细粒度的非结构化稀疏模式可以实现更好的精度,但在现有硬件上效率低下。另一方面,一些现代处理器配备了快速的片上刻痕存储器和聚集/散射引擎,用于在这种存储器上执行间接负载和存储操作。在这项工作中,我们提出了一系列新颖的稀疏模式,命名为聚光散射(GS)模式,以利用Scratchpad存储器和收集/散射引擎来加速神经网络推论。相应地,我们呈现了一种紧凑的稀疏格式。提出的稀疏模式,以及一种新颖的修剪方法,解决了负载不平衡问题,并导致质量接近非结构化稀疏模型的型号,以及靠近结构化稀疏型号的计算效率。我们的实验表明,与传统结构稀疏模式相比,GS模式在精度和计算效率之间始终如一地进行折衷。 GS模式可以以相同的精度级别将DNN组件的运行时间减少两到三次。这是在三个不同的深度学习任务和流行模型中确认,即机器翻译的GNMT,用于图像识别的Reset50,以及用于声学语音识别的Japser。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译
Depression is a leading cause of death worldwide, and the diagnosis of depression is nontrivial. Multimodal learning is a popular solution for automatic diagnosis of depression, and the existing works suffer two main drawbacks: 1) the high-order interactions between different modalities can not be well exploited; and 2) interpretability of the models are weak. To remedy these drawbacks, we propose a multimodal multi-order factor fusion (MMFF) method. Our method can well exploit the high-order interactions between different modalities by extracting and assembling modality factors under the guide of a shared latent proxy. We conduct extensive experiments on two recent and popular datasets, E-DAIC-WOZ and CMDC, and the results show that our method achieve significantly better performance compared with other existing approaches. Besides, by analyzing the process of factor assembly, our model can intuitively show the contribution of each factor. This helps us understand the fusion mechanism.
translated by 谷歌翻译